Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

نویسندگان

Daniel R. Jiang

Lina Al-Kanj

Warren B. Powell

چکیده

Monte Carlo Tree Search (MCTS), most famously used in game-play artificial intelligence (e.g., the game of Go), is a well-known strategy for constructing approximate solutions to sequential decision problems. Its primary innovation is the use of a heuristic, known as a default policy, to obtain Monte Carlo estimates of downstream values for states in a decision tree. This information is used to iteratively expand the tree towards regions of states and actions that an optimal policy might visit. However, to guarantee convergence to the optimal action, MCTS requires the entire tree to be expanded asymptotically. In this paper, we propose a new technique called Primal-Dual MCTS that utilizes sampled information relaxation upper bounds on potential actions, creating the possibility of “ignoring” parts of the tree that stem from highly suboptimal choices. This allows us to prove that despite converging to a partial decision tree in the limit, the recommended action from Primal-Dual MCTS is optimal. The new approach shows significant promise when used to optimize the behavior of a single driver navigating a graph while operating on a ride-sharing platform. Numerical experiments on a real dataset of 7,000 trips in New Jersey suggest that Primal-Dual MCTS improves upon standard MCTS by producing deeper decision trees and exhibits a reduced sensitivity to the size of the action space.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast kernel conditional density estimation: A dual-tree Monte Carlo approach

We describe a fast, data-driven bandwidth selection procedure for kernel conditional density estimation (KCDE). Specifically, we give aMonte Carlo dual-tree algorithm for efficient, error-controlled approximation of a cross-validated likelihood objective. While exact evaluation of this objective has an unscalableO(n2) computational cost, ourmethod is practical and shows speedup factors as high ...

متن کامل

Monte Carlo Tree Search in Simultaneous Move Games with Applications to Goofspiel

Monte Carlo Tree Search (MCTS) has become a widely popular sampled-based search algorithm for two-player games with perfect information. When actions are chosen simultaneously, players may need to mix between their strategies. In this paper, we discuss the adaptation of MCTS to simultaneous move games. We introduce a new algorithm, Online Outcome Sampling (OOS), that approaches a Nash equilibri...

متن کامل

Monte-Carlo Tree Search: To MC or to DP?

State-of-the-art Monte-Carlo tree search algorithms can be parametrized with any of the two information updating procedures: MC-backup and DP-backup. The dynamics of these two procedures is very different, and so far, their relative pros and cons have been poorly understood. Formally analyzing the dependency of MCand DP-backups on various MDP parameters, we reveal numerous important issues that...

متن کامل

Monte-Carlo Expression Discovery

Monte-Carlo Tree Search is a general search algorithm that gives good results in games. Genetic Programming evaluates and combines trees to discover expressions that maximize a given fitness function. In this paper Monte-Carlo Tree Search is used to generate expressions that are evaluated in the same way as in Genetic Programming. Monte-Carlo Tree Search is transformed in order to search expres...

متن کامل

Smooth UCT Search in Computer Poker

Self-play Monte Carlo Tree Search (MCTS) has been successful in many perfect-information twoplayer games. Although these methods have been extended to imperfect-information games, so far they have not achieved the same level of practical success or theoretical convergence guarantees as competing methods. In this paper we introduce Smooth UCT, a variant of the established Upper Confidence Bounds...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1704.05963 شماره

صفحات -

تاریخ انتشار 2017

Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

نویسندگان

چکیده

منابع مشابه

Fast kernel conditional density estimation: A dual-tree Monte Carlo approach

Monte Carlo Tree Search in Simultaneous Move Games with Applications to Goofspiel

Monte-Carlo Tree Search: To MC or to DP?

Monte-Carlo Expression Discovery

Smooth UCT Search in Computer Poker

عنوان ژورنال:

اشتراک گذاری